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Abstract. The nonlinear filter for an ergodic signal observed in white noise is said to achieve 
maximal accuracy if the stationary filtering error vanishes as the signal to noise ratio diverges. We 
give a general characterization of the maximal accuracy property in terms of various systems theoretic 
notions. When the signal state space is a finite set explicit necessary and sufficient conditions are 
obtained, while the linear Gaussian case reduces to a classic result of Kwakcrnaak and Sivan (1972). 
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1. Introduction. Let (X t )t>o be a signal of interest, which we model as an 
ergodic Markov process. It is often the case that the detection of such a signal 
is imperfect: only some function of the signal may be directly observable, and the 
observations are additionally corrupted by additive white noise. That is, one observes 
in practice the integrated observation process 



where B t is a Wiener process independent of (X t )t>o, k > determines the strength 
of the corrupting noise, and h is a (possibly nonlinear and noninvertible) function 
of the signal. When only the imperfect observations (Y s ) s < t are available, the exact 
value of the signal X t can certainly not be detected with arbitrary precision, even 
when t is very large (so that we have a long observation history at our disposal). 

To improve the accuracy of our detector, we must decrease the strength of the 
corrupting noise. It is intuitively obvious that as k — > 0, we will eventually be able 
to determine precisely the value of h(X t ). However, when the function h is not 
invertiblc (as is the case in many engineering systems of practical interest), this does 
not necessarily imply that we will be able to determine precisely the value of the signal 
itself. The optimal estimate of the signal X t , given the observation history (Y s ) s < t , is 
called the nonlinear filter. We say that the filter achieves maximal accuracy if, as the 
noise strength vanishes, the stationary filtering error vanishes also — i.e., if as t — > oo 
and K — > 0, we are able to determine precisely the value of the signal. 

When do nonlinear filters achieve maximal accuracy? In the special linear Gaus- 
sian case, where the nonlinear filter reduces to the Kalman-Bucy filter, this question 
was first posed and resolved in a well known paper of Kwakernaak and Sivan [T^ |. 
Somewhat surprisingly, the answer is far from trivial and the proof given by Kwaker- 
naak and Sivan is reasonably involved. In fact, Kwakcrnaak and Sivan chiefly study 
the dual deterministic control problem with 'cheap' control. Their proof is not proba- 
bilistic in nature, but is based on a delicate analysis of the associated Riccati equation. 

Very little appears to be known beyond the linear Gaussian case. To the best of 
our knowledge the only nonlinear result is due to Zeitouni and Dembo p4j , who study 
a special class of diffusion signals with nonlinear drift term and linear observations. 
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Their result, however, also reduces to the linear Gaussian case: the key step in the 
proof is to estimate the filtering error by that of an auxiliary Kalman-Bucy filter. 

The purpose of this paper is to investigate the maximal accuracy problem in a 
general setting. After setting up the problem and introducing the relevant concepts in 
section ^], we proceed in section |^ to relate the maximal accuracy property of the filter 



to several systems theoretic notions (theorem 3.1 below). The proof of our main result 
follows from simple probabilistic arguments. Then, in section []|, we apply our general 
result to provide a complete characterization of the maximal accuracy property for 
the case where the signal is a finite state Markov process. The resulting necessary 
and sufficient condition — observability of the model after time reversal, together with 
a condition of the graph coloring type — is easily verified, but is surprisingly quite 
different in nature than the result for linear Gaussian systems. 

Finally, in section ^, we revisit the linear Gaussian setting and provide a com- 
plete proof of the result of Kwakernaak and Sivan using our general characterization. 
Though this does not lead to new results, our approach does not use the explicit form 
of the filtering equations and some parts of the proof are significantly simpler than 
that of |l2| . We believe that our approach takes a little of the mystery out of the result 
of Kwakernaak and Sivan by placing it within a general probabilistic framework. 

Acknowledgment. The problem studied in this paper was posed to me by Prof. 
Ofer Zeitouni during a visit to the University of Minnesota in October 2008. I am 
indebted to him for arranging this visit and for our many subsequent discussions on 
this topic, without which this paper would not have been written. 

2. Preliminaries. We suppose that defined on a probability space (fi,?, P) is 
a stationary Markov process (Xt)tem with cadlag sample paths in the Polish state 
space E, and we denote its stationary measure as P(Ao G A) = ir(A). Moreover, 
we presume that the probability space supports an n-dimensional two-sided Wiener 
process (i? t )t 6 R that is independent of (X t )t£R- Let us define for every n > the 
M n -valued observation process (Y t K ) te R according to the expression 



Y t K = [ h(X s )ds + nB t 
Jo 



where h : E — > R" is a given observation function. Xt is called the signal process, and 
Yt is called the observations process. In addition, we introduce the following notation. 
Let X t = X-t be the time reversed signal, and note that X t is again a stationary 
Markov process under P with invariant distribution n. We denote 

Jf = a{X s : s G /}, 3? = a{X 8 : s G /}, 

3^ (X) = a{h(X s ) : s G /}, rf {jC) = o{h(X s ) : s G 1} 

for I C E, while 

b>a a<b 

for a < b. Finally, for any probability measure fi <C 7r, wc define 

p»(a) = e(i a ^(x ) 
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Note that P^{X Q G A) = P M (X G A) = fx(A) by construction, while 

E^(/(X t )|^ 00 , s] ) = VHfiXJF*^), E^(f(X t )\T^ s] ) = E(/(X t )|s£ ^ 

for any t > s > and bounded function / by the Baycs formula. Therefore, under P**, 
the one-sided processes (X t )t>o and (X t )t>o are still Markov with the same transition 
probabilities as under P, but with the initial measure \i instead of 7r. 

Remark 2.1. If we are given a transition semigroup for the Markov process 
(X t )t>0i then we can construct P^|( J 5f t ,y f ) t>0 even for fi n. However, the transition 
semigroup for the time reversed process (X t ) t >o under P is defined implicitly only up 
to 7r-a.s. equivalence in terms of the regular conditional probabilities P(X_ t G • |-X"o)- 
Therefore, for the time reversed process, we can not unambiguously define P tJ, (X t G A) 
for n -^C it. We will therefore restrict our attention throughout to probability measures 



\i <C 7r, except in remark 2.11 and lemmas 2.12 and 2.13 below where only (Xt, lt)t>o 



under P^ is considered (and not the time reversed part). 

The nonlinear filter of the signal X t given the noisy observations Y S K , < s < t 
is defined as the regular conditional probability P(Xt £ • l^ro'fi)- By construction, 
the filter minimizes the mean square estimation error et(f, k) for every test function 
/ : E -> M with / pdir < oo: i.e., 



e t (f, K ) = E ({f(X t ) - E(/(X t )| Jg)}' 



is minimal (e t (/, K) < E({f(X t ) — F} 2 ) for every ^'"-.-measurable F). 

Lemma 2.2. For every test function f with J f 2 dir < oo and every noise strength 
k > 0, the mean square error e t (f, k) converges to the stationary error 



t lune t (/,«) = EN/(Xo)-E(/(X )|^ )0] )} J :=e(/,« 



Proof. The result follows directly from the stationarity of P and the martingale 
convergence theorem. □ 

Our interest is in the behavior of e(f, k) in the limit k — ► where the observation 
noise vanishes. In particular, we aim to understand when the filter achieves maximal 
accuracy. 

Definition 2.3. The filter is said to achieve maximal accuracy if e(f,n) 
as k — > whenever j f 2 dn < oo, i.e., if the true location of the signal is revealed in 
the stationary limit when the observation noise vanishes asymptotically. 

Our main results relate the maximal accuracy problem to certain structural prop- 
erties of a systems theoretic flavor. We define these notions presently. 

Definition 2.4. The model is said to be reconstructible if 

fx, v <C 7r and fi ^ v implies P''Lm^) ^P^ItM*) . 

Definition 2.5. The model is said to be strongly reconstructible if 
/1, v <C 7r and [i J_ v implies P^Li,^) 1 P"Lm^) • 
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Definition 2.6. The model is said to be invertible if for every t > s, 

MX) 



the random 



variable X t coincides P-a.s. with a a(X s ) V 3^^^ -measurable random variable. 

Definition 2.7. The model is said to be stably invertible if for every t the 
random variable X* coincides P-a.s. with a uf .^-measurable random variable. 
Let us finish this section with some remarks on these definitions. 
Remark 2.8. In the setting of deterministic linear systems theory, the notion 
of reconstructibility dates back to Kalman M, see also 11 . Our definition 2.4 in 



the stochastic setting is close to a similar notion that plays an important role in 
the realization theory of stationary Gaussian processes |ll| [l3| . Reconstructibility is 
essentially the time reversed counterpart of the notion of observability p3] , though as 
discussed above we must restrict to probability measures /i, v <C tt. 



Remark 2.9. By the stationarity of P, definitions 2.6 and 2.7 can be restricted 
without loss of generality to the case t = 0. Thus invertibility means Xq can be 
written as a function of X s at some previous time s and all the intermediate noiseless 
observations h(X r ), i.e., Xq = F S [X S , (/i(X r )) s < r <o] for any s < 0. This idea is well 
known in the deterministic setting; see, e.g., jl4| in the linear case and H in the 
nonlinear case. We think of the inverse F s as being 'stable' if it becomes independent 
of X s as s — > — oo; it therefore makes sense to talk of stable inversion when Xq can 
be written as a function Xq — F- cx) [(h(X r )) r <o] of all past noiseless observations. 

Remark 2.10. Suppose that the model is invertible. Then certainly Xq is P-a.s. 
a{X u , h(X r ) : u < s, r < 0}-measurable for every s < 0. In particular, 



Xq is P-a.s. 



n( 



MX) 



X oo,s] V ^]-oo,0] 



-measurable. 



Now suppose that the left tail er-field H s <o ^h-oo s] * s P-trivial, i.e., the signal is 
ergodic in a weak sense E^] . Then it is tempting to exchange the order of intersection 
and supremum, as follows: 



A is P-a.s. fl (^oo, s ]V^ 0] ) ^ 

s<0 



n*5 



X 

— oo,sl 



h(X) 
-oo,0] 



= ^i^g] -measurable. 



This would indicate that invertibility plus ergodicity implies stable invertibility. How- 
ever, the exchange of intersection and supremum is not necessarily permitted, as an 
illuminating counterexample in || shows. This conclusion is therefore invalid. A fur- 
ther discussion of this problem can be found in f22|| . In particular, it is evident that 
the present problem is closely related to the innovations problem which is discussed 
in p2[ . Another closely related problem, that of the stability of the nonlinear filter, 
is discussed in detail in plj; however, it should be noted that the nondegeneracy 
assumption made there is manifestly absent in the problems discussed here. 

Remark 2.11. Suppose the signal is not started in the stationary distribution, 
but in a distribution /i that is not even necessarily absolutely continuous with respect 
to 7r. In this setting, the maximal achievable accuracy problem is to determine whether 



(/,k)=E" ({f(X t )-E^f(X t )\^ t] )}' 



converges to zero as t - 
show that if ||P M (A t e 



k — ► (in that order). However, we will presently 
^IItv ~~ * as t — > oo, this problem reduces to the 
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stationary problem where /it = it. In particular, when the signal is ergodic, the maximal 
achievable accuracy problem always reduces to the stationary case (by ergodic we mean 
||P M (X t e •) — 7t||tv —* for all fx). This strongly motivates our choice to study 
directly the stationary problem in the remainder of this paper. 

Let us now make these claims precise in the form of two lemmas. For simplicity, 
we concentrate on bounded functions, which is not a significant restriction. 

Lemma 2.12. Suppose that f is a bounded measurable function. Then for any re 



|P"(* t e - )-A 



TV 



implies limsup (/, re) < e(/, re). 

t — >oo 



Thus e(f, re) — > as re — > implies e^(f, re) — > as £ — > oo, re — > i/iai order). 

Lemma 2.13. Suppose that ||P M (X t S •) — 7r||xv — * /or a/Z ('i.e., the signal 
is ergodic). Then ( /, re) — > e( /, re) /or a// re > 0, /i, and bounded measurable f. 

Proof of lemmas 2.H and 2.t!\ Let P< be the Markov semigroup of the signal 
/iP t = P AI (X t £ • ). We basically follow Kunita pi]. First, by Jensen's inequality 



t+s]) 



3, 



[s.t+s] 



< w(f(x t+s f)-w(w(f(x t+s pJ-;; +s] f 



W P» {f{ X t f) - W p ° [E^(f(X t )\^ 



(/>«) 



for < s < t. Now suppose that \\fiP s — tt||tv — * 0. Then it follows trivially (as / is 
bounded) that E^ Ps (f(X t ) 2 ) — » ~E(f(X t ) 2 ). On the other hand, we can estimate 

(E^ (f(X t )\^) 2 ) E(E(/(X t )|^; ] ) 2 ) 

< ll/ll^ll/x^ -7r|| w + 2||/|UE(|E^(/(J5f t )|^ ] ) -E(/(^)|^',t])|) ■ 

Using the identity E P ( |E P (X | S)-E Q (X |S) |) < 4||X||oo||P-Q||tv |, theorem 3.1], it 
follows that the left hand side of this expression converges to zero. We have therefore 
shown that if \\{J,P S — 7t||tv — > 0, then e^ 3 (/, re) — * ej(/, re) as s — > oo. In particular, 

lim sup e% (/, re) = lim sup lim sup e^ +s (/, re) 

s — >oo 

< lim sup lim sup e^^/, re) = limsupe t (/, re) = e(/, re). 

i — >oc s — >oo i — >oc 



This proves lemma 2.12 . For lemma 2.13 , note that 

e? +s {f, re) = W(f(X t+s f) W [W (w (f {X t+s )\^ +s] V 3f s] ) 



[0,t+s 



> E^(/(X t+s ) 2 ) - E^E"(/(X t+s )|^; +s] V 9f s] 
= E^/(X t+s ) 2 )-E^(E''(E^(/(X t+s )|^; +s] V ( 7(X s )) 2 | ( 7(X s ) 

= w p ° [e 5 -o (/( a:,) 2 ) -E 5 -o(E 5x o( / (x t )|^; ] ) 2 )] = E^ [4 Xo (f,K) 



Thus evidently, we can estimate 

liminf e^(/, re) = liminf liminf ef, (/, re) > liminf E 



^0 



(/,«) 
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and it remains to establish that the latter limit equals e(/, k). But 



E 



e t X °(f' K ) -e*(/,«) 
<2|| 



e(/c^)|^K) 2 



E(/(X t )|?, 



[o,t], 



.E'*o(/(* t )|s*>) 
E' 5 -o( / (x t )|jg ] ) 



using /c > 0, ergodicity of the signal, and |2l], theorem 6.6]. □ 

We emphasize, in particular, that in the ergodic case the maximal achievable accu- 
racy problem is completely equivalent to the stationary maximal achievable accuracy 
problem for any initial measure [i. We may therefore concentrate on the stationary 
case without loss of generality, which we will do from now on. Note, however, that 
our results below do not themselves require ergodicity. 

3. A General Result. The purpose of this section is to prove the following 
general result, which relates the maximal achievable accuracy problem to the various 
systems theoretic notions introduced above. 

Theorem 3.1. The following conditions are equivalent: 

1. The filter achieves maximal accuracy. 

2. The filtering model is stably invertible. 

3. The filtering model is strongly reconstructible. 
Moreover, any of these conditions implies the following: 

4. The filtering model is invertible. 

5. The filtering model is reconstructible. 

It should be noted that often invertibility and reconstructibility arc much easier to 
verify than stable invertibility or strong reconstructibility. However, our general result 
only shows that the former are necessary conditions for the filter to achieve maximal 
accuracy. In the next section, we will show that when the signal state space is a finite 
set, the filter achieves maximal accuracy if and only if the model is both invertible and 
reconstructible. This will allow us to give simple necessary and sufficient conditions 
which can be verified directly in terms of the model coefficients. In general, however, 
it need not be the case that invertibility and reco nstructibility are sufficient for the 
filter to achieve maximal accuracy, see section 5.1 for a counterexample. 



The remainder of this section is devoted to the proof of theorem 3.1 



3.1. Proof of 1 2. The key here is that we can characterize precisely the 
limit of e(/, n) as k — ► 0. 

Lemma 3.2. For every test function f with f f 2 dir < oo, we have 

e(f, K ) ^ e(f, 0) = E ({f(X ) E(/(X )|^ 0] ) }*) := e(/). 



Proof. Let m \ as I — > oo. It suffices to show that e(f, Kg) — » e(f ) as I — » oo 
for every such sequence. We will therefore fix an arbitrary sequence K£ \ in the 
remainder of the proof. 

Without loss of generality, we assume (enlarging the probability space if neces- 
sary) that 5", P) carries a countable sequence of independent n-dimensional 
Wiener processes which are independent of (X t )teR- Define 

oo . t 
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Note that it is easily established that the sum in the expression for W[ is a.s. conver- 
gent uniformly on compact time intervals, and that the limit is a Wiener process with 
covariance k% I. The process (X t , Y t Kr )t e R therefore has the same law as (X t , Z^) t ^M 
under P, and in particular (here i — cr {-^t : * — 0}) 

e(J,Kr) = E ({/(*„) - E(f(X a p^ 0] )}' 



But by the independence of Bf and the signal, evidently 



E(/(X )|^ i0] ) = E(/(JT )|3f L.q V 5f-.o] V ■ ' ' V 



P-a.s. 



where 5F, 



-oo,0] 



a{B l t : t < 0}. Therefore 



E /(*„) 



E(/(X )|^_ OOi0] 

by the martingale convergence theorem. But using again independence 



oo,0] v V u ]—oo,0] 
£>0 , 



in L 2 (P) 



E f(X ] 



J ]-oo,0] V V J ]-oo,0] 
£>0 j 



E f(X ) 



^ (X) nl V V ol 
) — oo.OJ V ] — oo,0J 

e>o j 



E(/(X )|?' 



-oo,0] J 



P-a.s. 



and the claim follows directly. □ 

We can now prove the implications 1 =>- 2 and 2 =>■ 1 of theorem 3.1 . 

Proof of theorem \S.1[ 1 => 2. Suppose the filter achieves maximal accuracy. Then 



e(f) = 



/(X )=E(/(X )|^ 



h(X) 



,0]j 



P-a.s. 



whenever / is bounded and measurable. As the signal state space E is Polish, it is 
isomorphic as a measure space to a subset of the interval [0, 1]. Denote by i : E — > [0, 1] 
this isomorphism. Setting / = t above, we find that i(-Xo) coincides P-a.s. with an 
J^^gj -measurable random variable. Therefore so does X . □ 



Proof of theorem 3.1, 2 => 1. Suppose the filtering model is stably invertible. It 
follows immediately that e(f) — for every test function / with J f 2 dir < oo. □ 

3.2. Proof of 2 ^3. 



Proof of theorem 3.1 



3. Let //, v <C 7r and /i ± v. Define the event 
M = {dfj,/dn(Xo) > 0}. If the filtering model is stably invertible, then Im coincides 

But then P^(H) = 1 and P V {H) = 0, so the 



P-a.s. with Ih for some H 6 3^ 



A(X) 
-oo,0]' 



filtering model is strongly rcconstructible. □ 

Proof of theorem 3.1, 3 =>■ 2. We suppose the model is strongly reconstructible. 
Let {Ai,...,A m } be a partition of E with 7r(j4j) > for all i. Define iti(B) = 
tt(B n Ai)/n(Ai). Then iti <C 7r for every i and 7ri _L nj for i ^ j. By strong 



reconstructibility, we may therefore find disjoint H\, . . . ,H m G 
P Vi (Hj) = 6ij for all or, in other words, P(H t \X e A,) = 1. 



-oo,0] 



such that 
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Now let f(x) = J^i filAi(x), and let H = £\ ft 1 Hi (fi are distinct). Then 

m 

P(f(X ) = H) = J2 P(.f(X ) =H\X e Ai) P(X G A t ) 

i=i 

m 

= Y,?(Hi\X G A)P(X G A,) = 1. 

i=l 

Therefore /(Xq) coincides P-a.s. with the 3^^ i -measurable random variable H. 

Evidently /(Xq) coincides P-a.s. with an 3^^, j -measurable random variable 
whenever / is a simple function. But recall that any measurable function can be ap- 
proximated monotonically by a sequence simple functions, so that the claim evidently 
holds for any measurable function /. It suffices to note that as the signal state space 
E is Polish, it is isomorphic as a measure space to a subset of the interval [0, 1], so 
that we may apply our conclusion to the isomorphism l. □ 

3.3. Proof of 1 =>■ 4. The proof of this implication follows from the following 



observation: it can be read off from the proof of lemmas |2.12| and |2.13| that 
E 



{/(X t ) -E(/(X t )|?[ ( ? V^(Io))} 2 ) < e t+s (/,0) 
whenever J / 2 dir < oo and t, s > 0. Therefore, if the filter achieves maximal accuracy, 
[f(X t ) -E(f(X t )\^ V,(I„))} 2 | < Urn e t+s (/,0) = e(f) = 0. 



E 

Thus f(X t ) coincides P-a.s. with a J 1 ^^ Vcr (Xo)-measurable random variable. □ 

3.4. Proof of 3 => 5. Suppose the model is not reconstructible. We claim that 
it cannot be strongly reconstructible. Indeed, if the model is not reconstructible, then 



there exist fx, v <C 7r, /i ^ v such that P M Li,(x) = P v 



h(X) 



] — oo,0] ]-oo,0 



Define fi' = (fi - v) + /(n - v) + (E) and v' = (fi- v)~ /(p. - v)~(E). Clearly fj,', 
probability measures and jj! — v 1 oc fx — v (as (fx — v) + (E) = (fx — i>)~(E)), so 



fi',v' <C 7r, fx' ± u', and P M 

"' J-00,0] 



Hence the model can certainly not be strongly reconstructible. □ 

4. Finite State Space. We have shown above that invertibility and recon- 
structibility are necessary conditions for the filter to achieve maximal accuracy. In 
this section, we will show that in the case where the signal state space is a finite set, 
these conditions together are also sufficient. This is particularly useful as invertibility 
and reconstructibility can be verified algebraically in terms of the model coefficients, 
while verifying stable invertibility or strong reconstructibility directly is difficult. 

Let (Xt)t£R be a stationary finite state Markov process. The state space is E = 
{1, . . . , d}, the transition law is determined by the d x d transition intensities matrix 
A = (Xij), and the stationary measure is the d-dimensional vector 7r = (iXi). The 
observation function is also represented as a d-dimcnsional vector h = (hi) (as no 
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confusion may arise, we will make no distinction between functions and measures on 
E and their representing vectors). We will assume that 7Tj > for all i. 

Remark 4.1. The assumption that Hi > for all i is made for convenience only 
and does not entail any loss of generality. Indeed, if any of the entries of the stationary 
distribution tt are zero, then we may remove the corresponding points from the state 
space E and apply our results below to the resulting stationary Markov process on 



the reduced state space. Of course, the algebraic conditions in lemmas 4.2 and 4.4 
below must then be applied to the coefficients of the reduced model. 
The main result in this section is the following. 

Theorem 4.2. For the finite state filtering model, the following are equivalent: 

1 . The filter achieves maximal accuracy. 

2. The filtering model is stably invertible. 

3. The filtering model is strongly reconstructible. 

4. The filtering model is invertible and reconstructible. 

Clearly, all that remains to be shown is the implication 4 =>■ 1 . Before we proceed 
to the proof, let us show how invertibility and reconstructibility can be verified in 
terms of the model parameters. To this end we give the following two lemmas. 

Lemma 4.3. The finite state filtering model is invertible iff the following hold: 

1. For any i j such that Ay > 0, we have hi ^ hj. 

2. For any i =/= j k such that Ay > 0, \k > 0, we have hj ^ hk- 

Proof. Suppose the stated conditions hold. By the first condition, h(X t ) jumps 
at every jump time r of Xt- By the second condition, if X T - is known, then 
X T = F[X T -, h(X T )] for a suitable function F. Therefore, if X s is known for some 
s < t, then the entire path (X r ) s < r <t can be reconstructed from (h(X r )) s < r <t by 
a straightforward algorithm. This establishes invertibility. Conversely, if one of the 
stated conditions does not hold, it is easily seen that invertibility must fail. □ 

Lemma 4.4. The finite state filtering model is reconstructible if and only if 

dim (span ^H n "AH ni A • ■ ■ AH n "l : k, n , • . . , n k > j) = d. 

Here 1 = (1 1 • • ■ 1)* is the column vector of ones; A = (Ay) is the transition in- 
tensities matrix whose off-diagonal entries satisfy Ay = XjiTTj/iTi; and H = diag(/i). 

Proof. It is readily verified that the time reversed signal X t is a finite state Markov 
process with transition intensities matrix A jl5], theorem 3.7.1]. As we have assumed 
without loss of generality that every point of the state space is positively charged 
by 7r, any probability measure yU on E is absolutely continuous /Lt <C it. Therefore 
reconstructibility in this setting is simply observability of the reverse time system, 
and the condition in the lemma follows along the lines of (2(], lemma 9] . □ 

The condition in this last lemma can always be computed in a finite number of 
steps; see [ p0| remark 11] for further comments and a simple but explicit algorithm. 

4.1. Proof of 4 1. As we assume invertibility, we have 

f(X t ) = E(/(X t )|3 r [o^ ) V o-(X a )) P-a.s. for all functions /. 
Therefore, we can evidently write 



e t (/,0) = E NE(/(X t )|^ V<7(X )) -E(/(X t )|?g*>)}' 
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We would like to show that e(/) = 0, i.e., that e t (f, 0) — > as t — > oo for any /. As 

e t (af + f3g, 0) < 2a 2 e t (f, 0) + 2/? 2 e t (g, 0), 

it clearly suffices to restrict to positive / > such that J fdir = 1. Fix such a function, 
and define the probability measure dv = jdix. Then 



et(/,0) < 2 ||/||ocE( E(/(Xo)|^ V <r(X t )) -E(f(jt )\3$*>) 

E(/(A > )|jg, ( | ) Va(A > t )) i 



2 11/HooE" E 



E(/(^o)|^ ( ,? 



[0,t] 



2||/|| 00 E"(||P"(* t €; .|^J))-P(x t e ■|^o ( ,?)|| TV ) 

2 ||/|| 0O £E"(|P"(X t = i|3$p) - P(X t = i\3#*>)\), 



where we used the Bayes formula as in [pll lemma 5.6 and corollary 5.7] to show that 



||P"(**e •|^ ( ,?)-P(^e •l^f)|| TV = 

( E(/(X )|jfff Va(X t )) i 
^ E(/(X )|^f) 

But by reconstructibility and (2(], corollary 1] , we find that 



MX) 
[o,*] 



e*(/, 0) < 2 H/lloo J2 E"(|P"(* t = P(X t = i\3#$>)\) ^ 0. 

i=l 

Therefore et(/, 0) — > as i — » oo for any /, and the proof is complete. □ 

Remark 4.5. It is interesting to note that all the steps in this proof have 
counterparts in the general setting of section |2[ In particular, it is not difficult to 
establish that in general, to achieve maximal accuracy it is sufficient that the model 
is invcrtiblc and that the time-reversed noiseless filter is stable in the sense that 



E"(||P"(x t g ■ itfffi) - P(x t g • 



,h(X), 



I TV/ 



-0 ||£ 



< OO. 



On the other hand, the results in ]2(J are easily adapted to show that if the model is 
reconstructible, then the time-reversed noiseless filter is stable in the sense that 



E^|E"( 5 (X t )|^ j )-E( ff (X t )|^ J )|) ^0 « tt, ||f \\ x < co, g G L 2 (tt). 

Therefore, invertibility and reconstructibility imply maximal accuracy if one can close 
the gap between total variation stability and individual stability of the time-reversed 
noiseless filter. This is automatic in a finite state space (trivial) and in a countable 
state space (as the sequence space l\ has the Schur property 0, theorem 4.32]). 
When the signal state space is continuous, however, invertibility and reconstructibility 
is typically not sufficient to guarantee that the filter achieves maximal accuracy; a 
counterexample is given in the next section. 



WHEN DO NONLINEAR FILTERS ACHIEVE MAXIMAL ACCURACY? 



11 



5. Linear Gaussian Models. In this section, we consider a linear Gaussian 
model of the following form (t £ R): 



Here (B t )teu and (W t )tm are independent two-sided Wiener processes of dimensions 
n and to, respectively, and the signal state space is E = W. We refer to Jll| for 
the definitions and basic properties of the various elements of linear systems theory 
(stable matrix, detectability, stabilizability, etc.) which are used below. 
We make the following assumptions: 

1. A £ W xp is an asymptotically stable matrix and (X t ) tG ^ is stationary; 

2. D £ W xm and H £ R nxp are matrices of full rank and to, n < p. 

The stability of A ensures that the signal is ergodic, and in particular that the sta- 
tionary solution of the signal equation exists and is unique. 

Remark 5.1. The rank assumption on D and H and the assumption on the 
dimensions to, n,p is made for convenience only and does not entail any loss of general- 
ity. Indeed, when the matrices are not of full rank we can trivially obtain an equivalent 
model of full rank by reducing the dimensions of W, B and/or Y K . Similarly, if m > p 
we can obtain an equivalent model with to = p. Of course, the algebraic condition in 
theorem |5.2| must then be applied to the coefficients of the reduced model. If n > p 
the filter is trivially seen to achieve maximal accuracy (as then H, being of full rank, 
has a left inverse, so the noiseless observations are fully informative). 

The maximal achievable accuracy problem in the linear Gaussian setting was con- 
sidered in a classic paper of Kwakernaak and Sivan Jl2| , where an almost^] necessary 
and sufficient condition was obtained. Their approach is surprisingly complicated, 
however, and relies on rather explicit computations of the behavior of Riccati equa- 
tions in the limit of vanishing noise. In this section we give a direct proof of their 
theorem by verifying the stable invertibility property. 

Theorem 5.2. In the linear Gaussian setting of this section, the filter achieves 
maximal accuracy if and only if the matrix H{XI — A)~ 1 D has linearly independent 
columns for all X £ C with Re A > 0. 

The result of Kwakernaak and Sivan follows easily. 

COROLLARY 5.3 (Kwakernaak-Sivan) . The following hold. 

1. If m > n, then the filter does not achieve maximal accuracy. 

2. If m — n, then the filter achieves maximal accuracy if and only ifdct[H(XI — 
A)~ 1 D] is nonzero for any A £ C with Re A > 0. 

3. //to < n and there exists M £ R mxn such that dct[MH(XI - A)- 1 D] is 
nonzero for any X £ C with Re A > 0, then the filter achieves maximal accuracy. 

Proof. We consider each case separately. 

1. H(XI — A) _1 D has m columns each of which is an n-dimcnsional vector. 
Therefore if to > n, the columns cannot be linearly independent for any A. 

2. When m = n the matrix H(XI - A)~ 1 D is square, so that it has linearly 
independent columns if and only if dct[iJ(A7 — A)~ 1 D] ^ 0. 




The result of Kwakernaak and Sivan, stated as corollary 5.3 below, gives necessary and sufficient 
conditions for the case m > n but only a sufficient condition for the case m < n. 
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3. If det[MH(XI - A)~ 1 D] is nonzero, the square matrix MH(XI - A)~ 1 D 
has linearly independent columns. Then certainly the columns of H(XI — A)~ 1 D are 
linearly independent. 

In view of these facts, the corollary follows by applying theorem 5.2. □ 

Remark 5.4. As is pointed out by Godbole H, the condition of theorem 3.1 



corresponds to the requirement that the model is invertible and that the inverse has 
no unstable modes (in the sense of linear systems). Indeed, note that H{XI — A)~ 1 D 
is the transfer function associated to our filtering model, so that invertibility holds 
in this setting if and only if the matrix H(XI — A)~ 1 D has a left inverse for all but 
a finite number of A G C (see, e.g., jbi], theorem 5]). The inverse is again a linear 
system whose transfer function is the left inverse of H(XI — A)~ 1 D, so that the lack of 
right halfplane zeros of H(XI — A)~ 1 D ensures that the inverse system docs not have 
any unstable poles. If there are additionally no zeros on the imaginary axis, then the 
inverse system is even asymptotically stable and the heuristic outlined in remark |2.9| 
can be rigorously implemented. 

However, it is not immediately obvious from such arguments that stable invertibil- 
ity follows even when there are zeros on the imaginary axis, or that the model cannot 
be stably invertible when there are right halfplane zeros. In the proof of theorem |5~2] , 
the former problem is circumvented by using the idea in 1 12 of using an approximate, 
rather than exact, inverse system. The latter problem is easily resolved directly in 
our setting, and the proof of this part of the theorem is substantially simpler than 
the corresponding arguments of Kwakcrnaak and Sivan. 



5.1. An example. Before we proceed to the proof of theorem 5.2, let us demon- 
strate by means of an example that, unlike in the finite state setting, invertibility and 
reconstructibility are not always sufficient to ensure that the filter achieves maximal 
accuracy. This implies the existence of a gap between total variation and individual 
stability of the time reversed filter, discussed in remark 1.5. 

For our example, let m = n = 1 and p = 2, and we set 



A 



' -1 


" 


D = 


T 





-4 


1 



H = [l -2] 



This model satisfies all the assumptions of this section. We now compute 

(A + l)- 1 



H(XI-A)~ 1 D = [1 -2] 









T 




1 



(A + l)" 1 -2(A + 4)- x . 



(A + 4)- 1 

Therefore H{XI — A)~ 1 D = for A = 2, so by theorem 5.2 the filter does not 
achieve maximal accuracy. However, we claim that the model is both invertible and 
reconstructible . 

To prove invertibility, it suffices to note that H(XI — A)~ 1 D is nonzero (hence 
left invertible) for all but a finite number of A G C. To prove reconstructibility, we 
use the fact that the reverse time signal X t satisfies an equation of the form || 

-t 

X t =X + I ZA*Y,- 1 X s ds + DW t . 



where Wt is a suitably defined Wiener process and E denotes the stationary covariancc 
matrix of the signal. The matrix E can be computed as the unique solution of the 
Lyapunov equation: 



AE + YiA* + DD* = 



E = 



1/2 1/5 
1/5 1/8 



WHEN DO NONLINEAR FILTERS ACHIEVE MAXIMAL ACCURACY? 



13 



Note that £ is a strictly positive matrix. Therefore, the model is evidently recon- 
structible if HT, and HT,A* are linearly independent (so that the time reversed model 
is observable). But this is easily established to be the case by explicit computation. 



5.2. Proof of Theorem 5.2. We will show that the condition of the theorem 
is necessary and sufficient for stable invertibility. The necessity part of the proof is 
closely related to a problem of Karhunen Q , while sufficiency is proved along the lines 
of @. 

In the following, let us write Z t = HX t (t G M) for notational simplicity. We 
introduce the Hilbert spaces of random variables Zx, &z, & w C L 2 (P) as follows: 

Hx =L 2 (P)-cl{v* 1 X tl +--- + v* k X tk :k€H,h,...,t k <0, v u . . . ,v k G W}, 
L z = L 2 (P)-clKZ tl + • • • + v* k Z tk : k G N, h, . . . , t k < 0, v 1} . . . , v k e W 1 }, 
L w =L 2 (P)-c\{v* 1 W tl +--- + v* k W tk :fceN, h,...,t k <0, v 1 ,...,v k eW n }. 

For an Revalued random variable K we will write, e.g., K G &x when v*K G XLx 
for every v €R k . 

As the joint process (X t , Z t , Wt)t^M is Gaussian, the stable invertibility problem 
is essentially linear and can be reduced to the investigation of the spaces £x, £z, &w- 

Lemma 5.5. The model is stably invertible if and only if Hz = £w- 

Proof. By definition, the model is stably invertible iff Xq coincides P-a.s. with a 
<j{Z s : s < 0}-measurable random variable, i.e., iff E(Xo|Zi_ 00i o]) = Xo- However, as 
(Xq, Z s : s < 0) is Gaussian, it is well known that E(Xo|Z]_ OQi o]) G Hz (see, e.g., fl6| , 
lemma 6.2.2]). The model is therefore stably invertible iff Xq G Lz- 

To proceed, note that 

X t = I e A{t - s) DdW s , Z t = I He Mt ~ s) DdW s (t G E) 

J — oo J — oo 

as A is asymptotically stable. Therefore clearly Xq G &w and XL ^ C &w- IfXLty = 
it then follows immediately that Xq g XL.Z- It therefore remains to show that Xq G XLz 
implies XL^y C XL^. To this end, assume Xq g XL^. By stationarity, X t G XL^ also for 
t < 0. Therefore evidently XLx C XLz. But 

£>W S = X, -X + [ AX U du, s < 0. 



As Z? has full rank, we find that XLw C XLx- Therefore XL^y C XL^ as required. 



We can now complete the proof of theorem 5.2 



Proof of theorem 5.k. Suppose H(\I — A)^ 1 D does not have linearly independent 
columns for some A G C with Re A > 0. Then there exists ^ w G C m such that 
H(XI — A)~ 1 Dw = 0, and we define 



U = Ui+ iU 2 := f (e Xs w)*dW s , U u U 2 G L w . 

J — oo 

We can now compute 

/u 
e Xs v*He A{u - s) Dw ds = e Xu v*H(XI - A)~ l Dw = 
-OO 
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for all u < 0, v € R n . In particular, as v*Z u is real-valued, (Ui, v* Z u ) L 2(j,^ = 
(U2, v* Z u ) L 2( P -) = for u < 0, v € R", so that U^U^ -L &z- But as £/ is nonzero, 
evidently £^ 7^ £w and the model is not stably invcrtiblc. 

Conversely, suppose that the matrix H(XI — A)~ 1 D has linearly independent 
columns for all A G C with Re A > 0. We will prove that for any e > 0, there is a 
random variable Xq of the form 



= I m s (s) Z s ds e L z 
J —00 



such that \\Xq — -Xo||z,2(p) < e. Then Xq G Lz (as Lz is closed), so the model is 
stably invertible. 

To prove the claim, fix e > 0. Then 

m s {s) / I s > u He A{s - u ^DdW u ds= / / m £ (s) He A{s - u) D ds dW u , 

-oc J — oo J—ocJu 

provided that m £ is bounded and the function 

r° 

T £ (u):= I m s (s)He A{s - u) Dds 

J u 

is square integrable (this is justified by truncating the lower bounds on the integrals 
and applying Fubini's theorem for stochastic integrals |l7], theorem IV.64]). Note that 

ll*o-*o||i a(P )= f \\T £ (u)-e- Au D\\ 2 F du, 

where ||C||^ = Tr[CC*] is the Frobenius norm. Define for Re A > the Laplace 
transforms 

/o M 
e Xs m £ (s)ds, f £ (X)= e Xs T £ {s)ds = fn £ {X)H{\I-A)- 1 D. 
-oo J —oo 

By Plancherel's theorem, we can write (see, e.g., |23, pp. 162-163]) 

l^o _ x o\\l 2 (p) = 
1 I°° 

— lim / \\m £ (x + iy) H({x + iy}I - A)~ 1 D - ({x + iy}I - A)~ 1 D\\ 2 F dy. 

It remains to choose m £ with the required properties such that this expression is 
smaller than e 2 . 

By our assumption, the left inverse V(X) of the matrix H (XI — A)^ 1 D is defined on 
the right halfplanc, i.e., V(X)H(XI — A)~ 1 D = I for Re A > 0. The above expression 
for ||ATg— Xo||_l 2 (p) is therefore identically zero if we choose m £ ( A) as (XI — A)" 1 DV (X) . 
The problem is that the latter may not be the Laplace transform of a function m £ 
with the required properties. We therefore regularize as follows: 

m £ (X) = (XI - A)~ X D V(X) 

for some 7 > 0, I G N to be chosen presently. As A ^> H(XI — A)~ 1 D is a rational 
function, rh £ is rational also. We choose £ sufficiently large that the degree of the 
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denominator is larger than the degree of the numerator. Then rh e is strictly proper 
with poles in the closed left halfplane Re A < only, and is therefore the Laplace 
transform of some bounded function m £ . Moreover, as 

/oo 
\\T e {x + iy) || %dy = 
-oo 

2 

\\({x + iy}I - A)- l D\\ 2 F dy < 2tt ||X || 2 L2(P) , 
the function T e is square integrablc by the Paley- Wiener theorem. Finally, as 

2 

Wiiyl-A^Dfp dy^^O 

by dominated convergence, we may choose 7 such that \\Xq — ^o||l 2 (P) < £• Q 

5.3. The unstable case. To be fair, it should be noted that we have not en- 
tirely reproduced the result of Kwakernaak and Sivan as we have assumed that the 
matrix A is asymptotically stable. The result of Kwakernaak and Sivan states that 
the conclusion of corollary [5.3| above holds already under the weaker assumption that 
the filtering model is detectable and stabilizable. Unfortunately, our approach relies 
crucially on the stationarity (or ergodicity) of the signal process, so that one could 
never expect to obtain general results in the setting where the signal may be tran- 
sient. On the other hand, in the linear Gaussian case, the special structure of the 
model allows us to reduce the detectable/stabilizable case to the stationary case. We 
therefore recover the result of Kwakernaak and Sivan in its entirety. 

We develop the relevant argument presently. Let us emphasize, however, that the 
following argument is very specific to the linear Gaussian setting. 

We consider again a linear Gaussian model of the form 

X t =X a + f AX u du + DW u 
Jo 

Y t K = I HX u du + kB u 
Jo 

where (Bt)t&s. an d (Wt)teM are independent two-sided Wiener processes of dimensions 
n and m, respectively, and the signal state space is E = W. As above, we will assume 
D G M. pxm and H £ M. nxp are matrices of full rank and m,n < p. We do not assume, 
however, that A E M. pxp is stable; instead, we assume only that (A, D) is stabilizable 
and (A, H) is detectable. The law of X$ may be chosen arbitrarily. 

As (A, H) is detectable, it is well known that there exists a matrix K S W xn 
such that A := A — KH is an asymptotically stable matrix. Fix such a matrix K (it 
may not be unique, but this will not affect our final result). By Ito's rule, 

e~ M Xt=Xo+ [ e- As KHX s ds+ [ e - As DdW s = 
Jo Jo 

X + [ e- As KdY?~K f e~ As KdB s + f e - As DdW s . 
Jo Jo Jo 



sup 

x>0 



T 

(x + iy + j) e 



\Xq-X \ 



L'(P) 



1 

2^ 



(iy + jY 
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Now define 



X?:=X t - [ e i(M ifC Y t K := f HX^du 

Jo Jo 

Then evidently X* satisfies the stochastic differential equation 

X? = X + [ AX« ds + DW t - kK B t . 
Jo 

Moreover, we can compute 
ft 



KBt 



e -At x n = ^ 



er As DdW, 



o 



KHX* ds - k 



X 



e~ As DdW s 



s KdB* 



-As 



KdY" 



Thus evidently 



X t = X? + [ e A{t - s) KdY* 
Jo 



The following lemma is therefore immediate. 

Lemma 5.6. a{Y t K : t G [0,T]} = a{Y t K : t G [0,T]} P-a.s. V T < oo, K > 0. 
Proof. This follows directly from 



He A{s - u) KdY* ds, Y t K = Y t K 



He A{s - u) KdY* ds. 



The proof is complete. □ 

We now see immediately that for every t, n > 



E 



X t --E(X t \5\ 



Y,K ■ 
[0,t]. 



E 



X* - E(X t "|3*>) 



where 3> 'j] is defined in the obvious fashion. But A" t K is an ergodic Markov process (as 
A is stable), which brings us back — in principle — to the setting employed through- 
out this paper. However, note that the driving noise of X* is correlated with the 
observation noise, so that we can not immediately apply our previous results. 
Lemma 5.7. Define for t e E 



X 



e A(t - s) DdW s 



Then we have 



lim lim E 

K — *0 t— >OC 



X t E(X t |yg])| J = E - V(X° \a{HX° : a < 0})f) . 



Proof. Let us define, for k > and i > 0, the processes 



Ax*du + DWf 



D dW s - k 



r As KdB, 
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t K = f Hx^du + KB t , x?=x?- [ e A{t - s) Kdy h s , y t K = f Hx*du + nB,. 
Jo Jo Jo 



Then (xt,y£) is a Markov process with the same transition law as (X t ,Y t K ), except 
that we have chosen a specific initial law for Xq in a manner that depends on k. How- 
ever, as the model is stabiliziable and detectable, it is well known that the stationary 
filtering error exists and is independent of the initial law. Therefore 



lim E 

t— >oo 



X t - E(X t \3^^) 



= lim E 

t — >oo 



E(^o,t]) 



From the above discussion, it is now easily seen that in fact also 

2\ 



lim E 

t — *oo 



X t -Epftl?^, 



lim E 

t— >oo 



E(«f|5fo',t]) 



Here we have defined 3^'"] and 3%' ^1 in the obvious fashion. 

Now note that x*l is a stationary Markov process with the explicit representation 



z A (ts) DdW _ 



Therefore (a;",yf) immediately extend to all t € M, and by stationarity 



lim E 



X t - E(X t |3 r ^ ] ) 



E 



The proof is completed by following the same steps as in the proof of lemma 3.2, and 
noting that the Wiener process B enters linearly in the expression for Xq. □ 

COROLLARY 5.8. The filter achieves maximal accuracy if and only if the matrix 
H(XI — has linearly independent columns for all A £ C with Re A > 0. 



Proof. Immediate from theorem 5.2 



COROLLARY 5.9. The filter achieves maximal accuracy if and only if the matrix 
H(XI — A)~ 1 D has linearly independent columns for all A S C with Re A > 0. 

Proof. By j^, proposition 2], H(XI — A)~ 1 D has linearly independent columns iff 
H(XI — A + KH)~ 1 D has linearly independent columns, for any matrix K. □ 
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